feat: dynamic egress network rule updates for running sandboxes#2074
Open
feat: dynamic egress network rule updates for running sandboxes#2074
Conversation
Add API endpoint to dynamically update sandbox egress network rules on a running sandbox. The handler updates the API sandbox store atomically; the orchestrator gRPC call is stubbed for now (stage 2). Includes integration tests for success, not-found, unauthorized, clear rules, replace rules, and multiple sequential updates. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…Found The previous ID "nonexistent-sandbox-id" had too many hyphens, failing ID validation (400) before reaching the not-found path (404). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add full gRPC flow for runtime sandbox network config updates: - Proto: SandboxUpdateNetworkRequest + UpdateNetwork RPC - Orchestrator: UpdateNetwork handler updates nftables (non-TCP) and in-memory egress config (TCP proxy) with RWMutex thread safety - API: normalize bare IPs to CIDR notation before gRPC call - Slot.UpdateInternet: reset + re-apply firewall rules in sandbox netns - Thread-safe networkEgress field on Metadata (same pattern as endAt) - Integration tests verify actual connectivity changes (not just 204s) - test-network-update Makefile target for convenience Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…dedicated test target Remove unused *sandbox.Sandbox return value from UpdateSandboxNetworkConfig since no caller uses it. Remove dedicated test-network-update Makefile target in favor of the generic test/% pattern. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The creation path splits AllowOut entries into CIDRs and domains via ParseAddressesAndDomains, but the update path only handled CIDRs. Now the update path mirrors the creation path: domains are extracted, the default nameserver is added when domains are present, and domains are passed through gRPC to the orchestrator's in-memory config for the TCP proxy. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Verifies that network egress rules set via PUT /sandboxes/{id}/network
survive a pause/resume cycle — the resumed sandbox should still enforce
the same allow/deny rules.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Verify that dynamically adding a domain to AllowOut (via TLS SNI matching in the TCP proxy) works, and that removing the domain re-blocks traffic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…pdate API docs Consolidate startedAtMu/endAtMu/networkEgressMu into single rwmu. GetNetwork() now always returns non-nil with non-nil Ingress, removing nil checks in proxy.go and tcpfirewall. Update openapi.yml allowOut/denyOut descriptions to document domain support. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…se nil-safe getters Validate all CIDRs upfront in UpdateInternet to prevent partial firewall state if an invalid CIDR is passed after rules have been reset. Add missing 409 Conflict response to the PUT /network OpenAPI spec. Use GetIngress() nil-safe getter in proxy.go for consistency. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
After refactoring GetNetwork() to read egress from the mutex-protected networkEgress field, tests that construct Metadata directly must call SetNetworkEgress() to mirror what the constructor does. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…o lev-allow-deny-dynamic
Move the sandbox store mutation after the gRPC call to the orchestrator node. Previously the store was updated first, so a failed gRPC call left the store with the new config while the sandbox still enforced the old rules. Now: Get (read-only) → gRPC → Update (persist). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…at-a-time functions All firewall set mutations (ConfigureInternet, UpdateInternet, ResetInternet, and the constructor) now go through ReplaceUserRules which buffers all 4 set replacements and issues a single atomic nftables Flush. Removes AddAllowedCIDR, AddDeniedCIDR, Reset, ResetDeniedSets, ResetAllowedSets, and addCIDRToSet. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The firewall_toolkit's ValidateAddress rejects 0.0.0.0 as "unspecified", which broke sandbox creation with internetAccess=false after the atomic ReplaceUserRules refactoring. The old addCIDRToSet bypassed the toolkit for this case by creating raw nftables elements directly; restore that bypass in clearAndReplaceCIDRs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1a95c13a2d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Match the create path's validation in the PUT /sandboxes/{id}/network
handler: reject non-IP/CIDR values in denyOut (400 instead of 500),
and require 0.0.0.0/0 in denyOut when allowOut contains domains.
Add integration tests for both cases.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… table-driven suite Single sandbox, sequential subtests covering: 8 rejected inputs, 10 accepted inputs, 12 firewall rule steps (deny-all, allow precedence, replace semantics, domain/IP/CIDR, clear/reapply, allow-without-deny), and pause/resume rule preservation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
dobrac
requested changes
Mar 9, 2026
tests/integration/internal/tests/api/sandboxes/sandbox_network_update_test.go
Show resolved
Hide resolved
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tional apply/revert Combine the separate UpdateNetwork and Update gRPC RPCs into a single Update RPC that handles both end_time and egress changes. Updates are applied sequentially using a functional updater pattern — on failure, already-applied changes are rolled back in reverse order. Publish a SandboxUpdatedEvent for any successful update, including network egress details (allowed/denied CIDRs, allowed domains) in the event data alongside the existing set_timeout field. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The networkEgress field is set explicitly by the Update handler after UpdateInternet succeeds. Initializing it in ResumeSandbox was not needed by any change in this PR. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The previous commit (ed87f50) incorrectly removed networkEgress initialization from ResumeSandbox, assuming it was only needed for the dynamic update path. However, the API's Create handler also calls ResumeSandbox, so the egress config was never populated — causing the TCP proxy to see nil egress and allow all traffic through. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
dobrac
reviewed
Mar 10, 2026
dobrac
reviewed
Mar 10, 2026
dobrac
reviewed
Mar 10, 2026
Extract buildEgressConfig to deduplicate ParseAddressesAndDomains + nameserver + CIDR conversion logic between create and update paths. Add idna.Lookup.ToASCII validation for domain entries in allowOut. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ates Rollbacks use context.WithoutCancel to ensure they complete even if the request context is canceled. Replaces manual rollback loop in Update RPC. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Single source of truth for network settings — no more duplicate networkEgress field. Config.network is private with GetNetwork, SetNetworkEgress, and SetNetwork methods protected by Config's own RWMutex. Config is now *Config to avoid copylocks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Strip *. prefix before idna.Lookup.ToASCII validation since the wildcard is our own syntax, not a DNS label. Fixes integration test for *.example.com with deny all. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rename function and file for clarity. Initialize Config in test sandbox (was nil, causing panic). Fix egress assertion to match GetNetwork() non-nil defaults. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
dobrac
requested changes
Mar 11, 2026
…IDRs FlushSet only buffers a "clear set" netlink message — it does NOT commit to the kernel. Clarify this with comments and remove the redundant top-level FlushSet (ClearAndAddElements does its own). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…onfig
Replace direct &sandbox.Config{} + SetNetwork() pattern with
sandbox.NewConfig(sandbox.Config{...}) constructor that normalizes nil
Network to an empty proto and initializes the RWMutex. This removes
nil-guard boilerplate from GetNetwork() and prevents nil-pointer panics
when accessing network sub-fields via proto getters.
Also updates proxy.go to use proto nil-safe GetTrafficAccessToken()
instead of direct field access.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Main's f9e23eb changed Map.Insert/Get signatures (requiring Slot and MarkRunning), which broke test helpers in this file. Rather than fixing all the boilerplate, remove the tests that are redundant with integration tests (NotFound, EndTimeOnly, NoFieldsSet) and keep only the two rollback tests that exercise failure paths not covered by integration: egress failure must not change end_time, and a combined end_time+egress update must revert end_time when egress fails. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
GetNetwork() returned the proto pointer after releasing the read lock, allowing concurrent reads of Egress/Ingress to race with SetNetworkEgress writes. Replace GetNetwork() with GetNetworkEgress/GetNetworkIngress/ SetNetworkEgress that hold the lock while accessing sub-fields. Remove unused SetNetwork method. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
dobrac
approved these changes
Mar 12, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PUT /sandboxes/{sandboxID}/networkAPI endpoint to dynamically update egress firewall rules (allow/deny CIDRs and domains) on running sandboxes without restartUpdateNetworkRPC → nftables rule replacement in sandbox network namespaceallowOutvia TLS SNI matching in the TCP proxyRWMutexprotectingstartedAt,endAt, andnetworkEgressGetNetwork()guaranteed non-nil, eliminating nil checks in proxy and TCP firewall code pathsTest plan